Isabel Segura-Bedmar, V´ıctor Suarez-Paniagua, Paloma Mart ´ ´ınez
Computer Science Department
University Carlos III of Madrid, Spain
This paper describes a machine learningbased
approach that uses word embedding
features to recognize drug names from
biomedical texts. As a starting point,
we developed a baseline system based on
Conditional Random Field (CRF) trained
with standard features used in current
Named Entity Recognition (NER) systems.
Then, the system was extended to
incorporate new features, such as word
vectors and word clusters generated by
the Word2Vec tool and a lexicon feature
from the DINTO ontology. We trained the
Word2vec tool over two different corpus:
Wikipedia and MedLine. Our main goal
is to study the effectiveness of using word
embeddings as features to improve performance
on our baseline system, as well as
to analyze whether the DINTO ontology
could be a valuable complementary data
source integrated in a machine learning
NER system. To evaluate our approach
and compare it with previous work, we
conducted a series of experiments on the
dataset of SemEval-2013 Task 9.1 Drug
Name Recognition.